Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces
نویسندگان
چکیده
Two major approaches have been proposed to efficiently process queries in databases: Speeding up the search by using index structures, and speeding up the search by operating on a compressed database, such as a signature file. Both approaches have their limitations: Indexing techniques are inefficient in extreme configurations, such as high-dimensional spaces, where even a simple scan may be cheaper than an index-based search. Compression techniques are not very efficient in all other situations. We propose to combine both techniques to search for nearest neighbors in a high-dimensional space. For this purpose, we develop a compressed index, called the IQ-tree, with a three-level structure: The first level is a regular (flat) directory consisting of minimum bounding boxes, the second level contains data points in a compressed representation, and the third level contains the actual data. We overcome several engineering challenges in constructing an effective index structure of this type. The most significant of these is to decide how much to compress at the second level. Too much compression will lead to many needless expensive accesses to the third level. Too little compression will increase both the storage and the access cost for the first two levels. We develop a cost model and an optimization algorithm based on this cost model that permits an independent determination of the degree of compression for each second level page to minimize expected query cost. In an experimental evaluation, we demonstrate that the IQ-tree shows a performance that is the "best of both worlds" for a wide range of data distributions and dimensionalities.
منابع مشابه
Compression of High-dimensional Data Spaces Using Non-differential Augmented Vector Quantization
Most data-intensive applications are confronted with the problems of I/O bottleneck, poor query processing times and space requirements. Database compression alleviates this bottleneck, reduces disk space usage, improves disk access speed, speeds up query response time, reduces overall retrieval time and increases the effective I/O bandwidth. However, random access to individual tuples in a com...
متن کاملCompressing High - Dimensional Data Spaces Using Non - Differential Augmented Vector Quantization
Most data-intensive applications are confronted with the problems of I/O bottleneck, poor query processing times and space requirements. Database compression has been discovered to alleviate the I/O bottleneck, reduce disk space, improve disk access speed, speed up query, reduce overall retrieval time and increase the effective I/O bandwidth. However, random access to individual tuples in a com...
متن کاملA High Performance Image Data Compression Technique for Space Applications
M M AbstractA highly performing image data compression technique is currently being developed for space science applications under the requirement of high-speed and pushbroom scanning. The technique is also applicable to frame based imaging data. The algorithm combines a two-dimensional transform with a bitplane encoding; this results in an embedded bit string with exact desirable compression r...
متن کاملWavelet-based ECG data compression optimization with genetic algorithm
With a direct impact on compression performance, optimal quantization scheme is crucial for transformbased ECG data compression. However, traditional optimization schemes derived with signal adaption are commonly inherent with signal dependency and unsuitable for real-time application. In this paper, the variety of arrhythmia ECG signal is utilized for optimizing the quantization scheme of wave...
متن کاملCompact Description and Modeling of Multidimensional Information
We present a technique to compress data de ned as functions in high dimensional spaces. The objects in the spaces are represented by manifolds. Traditionally, data compression methods have been applied to functions de ned on simple manifolds such as the real line (e.g., audio), a rectangle (e.g., images), or a three{dimensional open{ended box (e.g., video). However, many conventional data compr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000